Streaming sync serialization by bjester · Pull Request #287 · learningequality/morango

bjester · 2026-02-13T00:00:26Z

Summary

This is step one in a complete revitalization of the sync pipeline
Adds new stream utilities for managing the processing of sync data in a streaming fashion
- I looked at several external libraries. Finding a combo that was simple, but still supported python 3.6, was a real challenge. The closest I found was streamz, which I unfortunately opted against because it uses tornado
Refactors _serialize_into_store logic into individual classes built upon foundational stream utilities-- so much better for unit testing!
Reorganizes some dependent code into locations for shared access and no circular references
Adds typing-extensions for backported future typing features
Updates MorangoProfileController to use sync_filter kwarg instead of filter-- always bothered me it shadowed the built-in
Adds unit tests for new stream utilities and converted serialization code-- the serialization process as a whole has pretty good coverage
Replaces usage of _serialize_into_store with new serialize_into_store streaming replacement
The new approach does not use bulk_update as Django was observed to spend excessive time with it

Improvements

The changes were evaluated by installing the local version into Kolibri. Kolibri was launched with a pre-existing database containing data for about 18,000 users. A dedicated command was created within Kolibri to run solely the serialization step, and then the performance of that command was benchmarked.

Further investigation will be required to determine how to reduce the increased duration.

\	Before	After
Peak Mem	325.7 MB	93.5 MB
Duration	12.49 sec	39.50 sec
Graphs

How AI was used

To look for stream libraries
Multiple models/providers were used to prototype the stream utilities
To verify and correct type hinting
To add comments, edited afterwards
To create tests for streaming utilities (simplistic)
To bootstrap tests for the serialization stream utils, heavily refactored by me

TODO

Have tests been written for the new code?
Has documentation been written/updated?
New dependencies (if any) added to requirements file

Reviewer guidance

Install the branch locally to Kolibri and perform some syncs with another local Kolibri

Issues addressed

Closes #192

Documentation

bjester added 7 commits February 12, 2026 15:45

Add property for quickly getting unique pairing ID

336fcc8

Prepare for more advanced and optimized iteration through app models

cd21893

Move transaction helper to shared location

ecd2c93

Move model helper to shared location

dd659d8

Add foundational structure for streaming sync processing

ee71780

Add test for model utility

8bfe235

Refactor serialize step into new structure

50ae745

bjester force-pushed the streaming-sync branch from f7d7b60 to 7762fff Compare February 13, 2026 00:05

rtibbles self-assigned this Feb 24, 2026

bjester added 2 commits February 25, 2026 13:43

Migrate to serialize stream

82996d5

Pin typing extensions to version that still supports 3.6

8898125

bjester force-pushed the streaming-sync branch from 3bc3ec8 to 8898125 Compare February 25, 2026 21:43

bjester marked this pull request as ready for review February 25, 2026 22:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming sync serialization#287

Streaming sync serialization#287
bjester wants to merge 9 commits intolearningequality:release-v0.9.xfrom
bjester:streaming-sync

bjester commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bjester commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Improvements

How AI was used

TODO

Reviewer guidance

Issues addressed

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bjester commented Feb 13, 2026 •

edited

Loading